A Multiplatform Chemometric Approach to Modeling of Mosquito Repellents
■181
Table 9.3: Statistical parameters of the established linear QSAR models for prediction of
Rindex of the set of natural and synthesized compounds
Regression model
R2
R2adj
R2cv
RMSE
F
ULR
0.6063
0.5845
0.5306
24.3495
27.7
MLR1
0.7289
0.6971
0.6552
20.7907
22.9
MLR2
0.8150
0.7804
0.7192
17.7024
23.5
that the introduction of additional predictor (increase in the number of variables) variable
improves the model’s quality more than it would be expected by chance.
Another confirmation of the quality of QSAR models is comparison between experimental
and predicted values, as well as the analysis of amplitude and randomness of residuals
(absolute differences between the experimental and predicted values). In an ideal case, the
relationship between experimental and predicted values is described by R2 = 1, while the
absolute values of the residuals are equal to zero. Quite extensive validation approaches,
including the cross-validation, have been applied in the studies by De et al. 2018, Natarajan
et al. 2008 and Wang et al. 2017.
9.3.5
Chemometric classification methods as a platform for repellents selection
9.3.5.1
Cluster analysis
Cluster analysis is one of the most favored chemometric pattern recognition techniques.
In the modeling of the compounds with repellent activity it can be applied for the purpose
of grouping of the compounds based on their molecular of bioactivity properties. The clus-
tering can be carried out as agglomerative clustering (each object observed individually
then gradually objects are merged into one group) or as division clustering (two groups
are being formed from one and then the next two from them). Since there is a building of
hierarchy of clusters, this analysis is also known as hierarchical cluster analysis (HCA).
The results of HCA are usually presented in a visual form known as dendrogram (Figure
9.5).
The dendrogram presented in Fig. 9.5 shows the grouping of natural repellents and
novel compounds synthesized by Thireou et al. 2018 in the space of their calculated physic-
ochemical descriptors, including: boiling point (BP), melting point (MP), critical temper-
ature (CT), critical pressure (CP), critical volume (CV), Gibbs energy (GE), lipophilicity
(logP), molar refractivity (MR), total polar surface area (tPSA), calculated lipophilicity de-
scriptor (ClogP) and calculated molar refractivity (CMR). All the descriptors were calcu-
lated by ChemBioDraw Ultra 13.0 program (PerkinElmer Inc.). The dendrogram indicates
that some of the synthesized compounds are quite similar in the space of the calculated
molecular features with the natural repellents. The closest similarity is between n-butyl
cinnamate and Syn7 compound, as well as ethyl cinnamate and Syn4 compound, whose
structures are presented in Figure 9.6. Also, on the basis of the presented results of HCA
analysis in Figure 9.5 it can be seen that there are two main clusters: one with the group
of seven synthetic compounds together with lauric acid, and other one with the rest of the
compounds.